Kode di Hide dalam default, untuk menampilkan
kode, klik Code.
# -=( Install & Load Package Function )=-
install_load <- function (package1, ...) {
# convert arguments to vector
packages <- c(package1, ...)
# start loop to determine if each package is installed
for(package in packages){
# if package is installed locally, load
if(package %in% rownames(installed.packages()))
do.call('library', list(package))
# if package is not installed locally, download, then load
else {
install.packages(package)
do.call("library", list(package))
}
}
}
path <- function(){
gsub ( "\\\\", "/", readClipboard () )
}
#Copy path, Panggil function di console
#Copy r path, paste ke var yang diinginkan
#Export chart
export.chart <- "C:/Users/Fathan/Documents/Obsidian Vault/2. Kuliah/Smt 5/8. Pengantar Sains Data/Tugas/Tugas Akhir/Chart"
1 Pendahuluan
1.1 Kelompok 3
| Nama | NIM |
|---|---|
| Angga Fathan Rofiqy | G1401211006 |
| Muhamad Farras Surya Dio Putra | G1401211018 |
| Salsabila Dwi Rahmi | G1401211026 |
| Dhiya Khalishah Tsany Suwarso | G1401211038 |
1.2 Latar Belakang
Bencana alam adalah suatu peristiwa alam yang dapat mengakibatkan dampak besar bagi populasi manusia. Menurut UCLouvain (2019), Indonesia salah satu negara dengan jmlah intensitas bencana alam terbanyak di dunia setelah Amerika Serikat.
Dampak negatif yang ditimbulkan menurut Chaudhary dan Piracha (2021). Diantaranya :
Kelangkaan bahan pangan
Trauma pasca bencana
Terjadinya migrasi secara besar-besaran
Masalah finansial dan ekonomi
Perasaan khawatir akan kehidupan selanjutnya
Penanggulangan bencana alam atau mitigasi adalah upaya berkelanjutan untuk mengurangi dampak bencana terhadap manusia dan harta benda.
1.3 Tujuan
Penelitian ini bertujuan mengelompokan provinsi di Indonesia menurut data intensitas bencana alam tahun 2018-2021 menggunakan metode Clustering.
Manfaat bagi Pemerintah
Hasil dari analisis ini diharapkan dapat menjadi referensi ataupun pedoman bagi pemerintah pusat maupun pemerintah daerah agar lebih fokus dalam merancang langkah-langkah yang harus diambil untuk mencegah atau menanggulangi dampak suatu bencana yang akan terjadi
Manfaat bagi Masyarakat
Hasil dari analisis ini diharapkan mampu dimanfaatkan oleh masyarakat Indonesia agar lebih mempersiapkan diri dan mempelajari cara penanggulangan bencana alam berdasarkan jumlah intensitas bencana alam yang tejadi di Provinsi Indonesia.
1.4 Metode Clustering
1.4.1 Hierarchical
Langkah-langkah untuk melakukan hierarchical cluster analysis:
- Menyiapkan data dimana data yang digunakan adalah data bertipe numerik agar dapat digunakan untuk penghitungan jarak.
- Menghitung (dis)similarity atau jarak antar data yang berpasangan pada dataset. Metode penghitungan (dis)similarity dapat dipilih berdasarkan data. Nilai (dis)similarity tersebut kemudian akan disusun menjadi distance matrix.
- Membuat dendrogram dari distance matrix menggunakan linkage method tertentu. Kita juga dapat mencoba beberapa linkage method kemudian memilih dedrogram paling baik.
- Menentukan dimana akan melakukan pemotongan tree (dengan nilai (dis)similarity tertentu). Disinilah tahap dimana cluster akan terbentuk.
- Melakukan interpretasi dari dendrogram yang telah didapat.
1.4.2 K-Means
1.4.3 Fuzzy C-Means
Fuzzy C-Means (FCM) adalah algoritma pengelompokan lunak yang diusulkan oleh Bezdek (1974; 1981). Berbeda dengan algoritma K-means di mana setiap objek data adalah anggota hanya satu kelompok, objek data adalah anggota dari semua kelompok dengan derajat keanggotaan yang bervariasi antara 0 dan 1 dalam FCM. Oleh karena itu, objek data yang lebih dekat ke pusat kelompok memiliki derajat keanggotaan yang lebih tinggi daripada objek yang tersebar di batas kelompok.
1.4.4 Gaussian Mixture Model (GMM)
Metode ini mengasumsikan bahwa keseluruhan individu adalah campuran dari sebaran peluang Gaussian, mewakili distribusi Gaussian dimana masing masing sebaran secara khas mempunyai parameter distribusi. Algoritma Expectation Maximization adalah salah satu alternatif algoritma yang banyak digunakan dalam melakukan pemodelan mixture.
1.5 Data
Data yang digunakan adalah data sekunder yang berasal dari situs www.bps.go.id berupa Banyaknya Desa/Kelurahan Menurut Jenis Bencana Alam dalam Tiga Tahun Terakhir (Desa), 2021. Data terdiri dari 34 amatan berupa provinsi yang ada di Indonesia.
Selain itu, terdapat data jumlah desa menurut provinsi di Indonesia pada tahun 2021 yang bersumber dari www.bps.go.id. Dilakukan standarisasi dengan membuat persentase antara jumlah desa yang terkena bencana alam dengan jumlah seluruh desa yang ada di tiap provinsi.
Peubah yang digunakan
| Peubah | Sebagai Peubah | Keterangan | Tipe Peubah |
|---|---|---|---|
Gada |
X1 | Tidak Ada Bencana Alam | Numerik |
KG |
X2 | Kekeringan | Numerik |
KH |
X3 | Kebakaran Hutan | Numerik |
GM |
X4 | Gunung Meletus | Numerik |
APB |
X5 | Angin Puyuh / Angin Puting Beliung / Topan | Numerik |
GPL |
X6 | Gelombang Pasang Laut | Numerik |
TSN |
X7 | Tsunami | Numerik |
GB |
X8 | Gempa Bumi | Numerik |
BB |
X9 | Banjir Bandang | Numerik |
BJR |
X10 | Banjir | Numerik |
TL |
X11 | Tanah Longsor | Numerik |
1.5.1 Data Entry
install_load('rio')
raw.data1 <- import("https://raw.githubusercontent.com/Zen-Rofiqy/STA1381-PSD/main/Tugas/Tugas%20Akhir/Data%20PSD.csv")
raw.data2 <- import("https://raw.githubusercontent.com/Zen-Rofiqy/STA1381-PSD/main/Tugas/Tugas%20Akhir/Data%20PSD_Perc.csv")
raw.data3 <- import("https://raw.githubusercontent.com/Zen-Rofiqy/STA1381-PSD/main/Tugas/Tugas%20Akhir/Data%20PSD_Desa.csv")
1.5.2 Data Checking
Mengecek Tipe data
str(raw.data1)
## 'data.frame': 35 obs. of 12 variables:
## $ Provinsi: chr "ACEH" "SUMATERA UTARA" "SUMATERA BARAT" "RIAU" ...
## $ Gada : int 4406 3827 513 1224 1001 2644 1195 2093 265 241 ...
## $ KG : int 173 127 43 51 16 98 19 30 0 27 ...
## $ KH : int 43 59 18 194 16 64 4 11 14 57 ...
## $ GM : int 1 82 0 0 0 0 0 0 0 0 ...
## $ APB : int 108 483 248 53 44 90 17 158 77 40 ...
## $ GPL : int 106 78 57 15 2 3 14 35 17 69 ...
## $ TSN : int 2 4 0 0 0 0 0 0 0 0 ...
## $ GB : int 493 964 364 0 36 49 66 47 0 0 ...
## $ BJR : int 1435 732 342 455 476 380 171 328 59 61 ...
## $ BB : int 81 52 65 1 17 36 15 23 0 1 ...
## $ TL : int 198 483 222 21 57 103 81 70 1 25 ...
str(raw.data2)
## 'data.frame': 35 obs. of 12 variables:
## $ Provinsi: chr "ACEH" "SUMATERA UTARA" "SUMATERA BARAT" "RIAU" ...
## $ Gada : num 67.6 62.4 44.3 65.2 64.1 ...
## $ KG : num 2.66 2.07 3.71 2.72 1.02 2.98 1.25 1.13 0 6.47 ...
## $ KH : num 0.66 0.96 1.55 10.34 1.02 ...
## $ GM : num 0.02 1.34 0 0 0 0 0 0 0 0 ...
## $ APB : num 1.66 7.88 21.4 2.83 2.82 ...
## $ GPL : num 1.63 1.27 4.92 0.8 0.13 ...
## $ TSN : num 0.03 0.07 0 0 0 0 0 0 0 0 ...
## $ GB : num 7.57 15.72 31.41 0 2.3 ...
## $ BJR : num 1.24 0.85 5.61 0.05 1.09 1.09 0.99 0.87 0 0.24 ...
## $ BB : num 22 11.9 29.5 24.2 30.5 ...
## $ TL : num 3.04 7.88 19.15 1.12 3.65 ...
str(raw.data3)
## 'data.frame': 35 obs. of 3 variables:
## $ Provinsi : chr "Aceh" "Sumatera Utara" "Sumatera Barat" "Riau" ...
## $ Jumlah Desa: int 6516 6132 1159 1876 1562 3289 1514 2654 393 417 ...
## $ KODE : int 11 12 13 14 15 16 17 18 19 21 ...
Semua tipe data sudah sesuai.
Mengecek Data kosong
sum(is.na(raw.data1))
## [1] 0
sum(is.na(raw.data2))
## [1] 0
sum(is.na(raw.data3))
## [1] 0
Tidak ada data kosong.
1.5.3 Frekuensi
install_load("DT")
datatable(raw.data1, filter = 'top',
options = list(pageLength = 10))
1.5.4 Persentase
datatable(raw.data2, filter = 'top',
options = list(pageLength = 10))
1.5.5 Desa
datatable(raw.data3, filter = 'top',
options = list(pageLength = 10))
1.6 Library
install_load("ppclust", "factoextra", "dplyr", "cluster", "fclust", "psych",
"FactoMineR", "ggplot2", "fmsb")
2 Eksplorasi
2.1 Korelasi Antar Peubah
dtx <- raw.data2[-35,-1]
pairs.panels(dtx, method = "pearson", stars=TRUE)
Bisa dilihat bahwa ada beberapa peubah yang memiliki nilai korelasi linier yang signifikan. Sebaran tiap peubahnya juga cenderung menjulur ke kanan, artinya nilai yang lebih tinggi dari setiap peubah cenderung lebih sering muncul di bagian kanan grafik. Ini menunjukkan adanya kecenderungan bahwa beberapa nilai ekstrem yang lebih tinggi mungkin mempengaruhi data, sehingga grafik cenderung menjulur ke kanan.
2.2 Pencilan
boxplot(dtx)
Sebagaimana yang sudah dijelaskan sebelumnya, bahwa data memiliki nilai ekstrim atau pencilan.
3 Clustering
3.1 GMM
library("mclust")
mod1 = Mclust(dtx)
mod1$BIC
## Bayesian Information Criterion (BIC):
## EII VII EEI VEI EVI VVI EEE
## 1 -2662.641 -2662.641 -1844.747 -1844.747 -1844.747 -1844.747 -1769.896
## 2 -2619.910 -2572.704 -1836.728 -1758.811 NA NA -1790.443
## 3 -2505.533 -2471.380 -1841.345 -1686.899 NA NA -1788.481
## 4 -2464.491 -2358.342 -1853.884 -1679.718 NA NA -1814.731
## 5 -2428.799 -2371.459 -1821.136 -1666.039 NA NA -1786.245
## 6 -2398.412 NA -1785.704 NA NA NA -1768.457
## 7 -2422.417 NA -1788.439 NA NA NA -1792.576
## 8 -2435.464 NA -1781.657 NA NA NA -1793.891
## 9 -2422.871 NA -1774.321 NA NA NA -1787.903
## VEE EVE VVE EEV VEV EVV VVV
## 1 -1769.896 -1769.896 -1769.896 -1769.896 -1769.896 -1769.896 -1769.896
## 2 NA NA NA -1967.937 -1801.372 NA NA
## 3 NA NA NA NA -1774.121 NA NA
## 4 NA NA NA NA NA NA NA
## 5 NA NA NA NA NA NA NA
## 6 NA NA NA NA NA NA NA
## 7 NA NA NA NA NA NA NA
## 8 NA NA NA NA NA NA NA
## 9 NA NA NA NA NA NA NA
##
## Top 3 models based on the BIC criterion:
## VEI,5 VEI,4 VEI,3
## -1666.039 -1679.718 -1686.899
3.1.1 Jumlah Cluster
plot(mod1, what = 'BIC')
mod1b = Mclust(dtx, G = 5, modelNames = c("VEI"))
summary(mod1b, parameters = TRUE)
## ----------------------------------------------------
## Gaussian finite mixture model fitted by EM algorithm
## ----------------------------------------------------
##
## Mclust VEI (diagonal, equal shape) model with 5 components:
##
## log-likelihood n df BIC ICL
## -702.5442 34 74 -1666.039 -1666.041
##
## Clustering table:
## 1 2 3 4 5
## 15 2 4 6 7
##
## Mixing probabilities:
## 1 2 3 4 5
## 0.44120997 0.05882351 0.11761936 0.17647041 0.20587676
##
## Means:
## [,1] [,2] [,3] [,4] [,5]
## Gada 52.499481629 58.6000015 6.315995e+01 8.090834e+01 5.509838e+01
## KG 4.833780575 1.9550000 1.699916e+00 1.076667e+00 1.848583e+00
## KH 3.049179029 0.7750001 9.425229e-01 6.649999e-01 3.228608e+00
## GM 0.159328371 1.1050001 7.499107e-02 1.745952e-19 3.316218e-19
## APB 10.048409835 6.8000004 7.267441e+00 2.468335e+00 1.847121e+00
## GPL 5.673073961 1.1450000 3.170129e+00 1.230000e+00 1.141446e+00
## TSN 0.005999547 0.0600000 1.290277e-15 6.285241e-29 3.516587e-19
## GB 17.454556575 22.2749975 1.006292e+01 4.418326e+00 9.971200e-01
## BJR 1.991941180 0.9899999 1.305008e+00 5.900004e-01 6.728486e-01
## BB 20.964429152 12.8899996 1.400227e+01 9.368336e+00 3.988306e+01
## TL 9.450756470 6.9450004 1.216003e+01 3.226666e+00 4.344297e+00
##
## Variances:
## [,,1]
## Gada KG KH GM APB GPL TSN GB
## Gada 302.967 0.00000 0.00000 0.0000000 0.0000 0.000 0.0000000000 0.0000
## KG 0.000 13.65424 0.00000 0.0000000 0.0000 0.000 0.0000000000 0.0000
## KH 0.000 0.00000 18.08222 0.0000000 0.0000 0.000 0.0000000000 0.0000
## GM 0.000 0.00000 0.00000 0.1303904 0.0000 0.000 0.0000000000 0.0000
## APB 0.000 0.00000 0.00000 0.0000000 64.9537 0.000 0.0000000000 0.0000
## GPL 0.000 0.00000 0.00000 0.0000000 0.0000 15.918 0.0000000000 0.0000
## TSN 0.000 0.00000 0.00000 0.0000000 0.0000 0.000 0.0001692191 0.0000
## GB 0.000 0.00000 0.00000 0.0000000 0.0000 0.000 0.0000000000 373.9283
## BJR 0.000 0.00000 0.00000 0.0000000 0.0000 0.000 0.0000000000 0.0000
## BB 0.000 0.00000 0.00000 0.0000000 0.0000 0.000 0.0000000000 0.0000
## TL 0.000 0.00000 0.00000 0.0000000 0.0000 0.000 0.0000000000 0.0000
## BJR BB TL
## Gada 0.000000 0.0000 0.00000
## KG 0.000000 0.0000 0.00000
## KH 0.000000 0.0000 0.00000
## GM 0.000000 0.0000 0.00000
## APB 0.000000 0.0000 0.00000
## GPL 0.000000 0.0000 0.00000
## TSN 0.000000 0.0000 0.00000
## GB 0.000000 0.0000 0.00000
## BJR 3.322503 0.0000 0.00000
## BB 0.000000 184.0492 0.00000
## TL 0.000000 0.0000 52.02535
## [,,2]
## Gada KG KH GM APB GPL TSN
## Gada 33.78676 0.000000 0.000000 0.00000000 0.00000 0.000000 0.000000e+00
## KG 0.00000 1.522716 0.000000 0.00000000 0.00000 0.000000 0.000000e+00
## KH 0.00000 0.000000 2.016522 0.00000000 0.00000 0.000000 0.000000e+00
## GM 0.00000 0.000000 0.000000 0.01454108 0.00000 0.000000 0.000000e+00
## APB 0.00000 0.000000 0.000000 0.00000000 7.24361 0.000000 0.000000e+00
## GPL 0.00000 0.000000 0.000000 0.00000000 0.00000 1.775169 0.000000e+00
## TSN 0.00000 0.000000 0.000000 0.00000000 0.00000 0.000000 1.887124e-05
## GB 0.00000 0.000000 0.000000 0.00000000 0.00000 0.000000 0.000000e+00
## BJR 0.00000 0.000000 0.000000 0.00000000 0.00000 0.000000 0.000000e+00
## BB 0.00000 0.000000 0.000000 0.00000000 0.00000 0.000000 0.000000e+00
## TL 0.00000 0.000000 0.000000 0.00000000 0.00000 0.000000 0.000000e+00
## GB BJR BB TL
## Gada 0.00000 0.0000000 0.00000 0.000000
## KG 0.00000 0.0000000 0.00000 0.000000
## KH 0.00000 0.0000000 0.00000 0.000000
## GM 0.00000 0.0000000 0.00000 0.000000
## APB 0.00000 0.0000000 0.00000 0.000000
## GPL 0.00000 0.0000000 0.00000 0.000000
## TSN 0.00000 0.0000000 0.00000 0.000000
## GB 41.70033 0.0000000 0.00000 0.000000
## BJR 0.00000 0.3705242 0.00000 0.000000
## BB 0.00000 0.0000000 20.52509 0.000000
## TL 0.00000 0.0000000 0.00000 5.801846
## [,,3]
## Gada KG KH GM APB GPL TSN
## Gada 26.81707 0.000000 0.000000 0.00000000 0.000000 0.000000 0.00000e+00
## KG 0.00000 1.208603 0.000000 0.00000000 0.000000 0.000000 0.00000e+00
## KH 0.00000 0.000000 1.600545 0.00000000 0.000000 0.000000 0.00000e+00
## GM 0.00000 0.000000 0.000000 0.01154148 0.000000 0.000000 0.00000e+00
## APB 0.00000 0.000000 0.000000 0.00000000 5.749366 0.000000 0.00000e+00
## GPL 0.00000 0.000000 0.000000 0.00000000 0.000000 1.408979 0.00000e+00
## TSN 0.00000 0.000000 0.000000 0.00000000 0.000000 0.000000 1.49784e-05
## GB 0.00000 0.000000 0.000000 0.00000000 0.000000 0.000000 0.00000e+00
## BJR 0.00000 0.000000 0.000000 0.00000000 0.000000 0.000000 0.00000e+00
## BB 0.00000 0.000000 0.000000 0.00000000 0.000000 0.000000 0.00000e+00
## TL 0.00000 0.000000 0.000000 0.00000000 0.000000 0.000000 0.00000e+00
## GB BJR BB TL
## Gada 0.0000 0.0000000 0.00000 0.000000
## KG 0.0000 0.0000000 0.00000 0.000000
## KH 0.0000 0.0000000 0.00000 0.000000
## GM 0.0000 0.0000000 0.00000 0.000000
## APB 0.0000 0.0000000 0.00000 0.000000
## GPL 0.0000 0.0000000 0.00000 0.000000
## TSN 0.0000 0.0000000 0.00000 0.000000
## GB 33.0982 0.0000000 0.00000 0.000000
## BJR 0.0000 0.2940908 0.00000 0.000000
## BB 0.0000 0.0000000 16.29108 0.000000
## TL 0.0000 0.0000000 0.00000 4.605015
## [,,4]
## Gada KG KH GM APB GPL TSN
## Gada 13.50905 0.0000000 0.0000000 0.000000 0.000000 0.0000000 0.000000e+00
## KG 0.00000 0.6088314 0.0000000 0.000000 0.000000 0.0000000 0.000000e+00
## KH 0.00000 0.0000000 0.8062713 0.000000 0.000000 0.0000000 0.000000e+00
## GM 0.00000 0.0000000 0.0000000 0.005814 0.000000 0.0000000 0.000000e+00
## APB 0.00000 0.0000000 0.0000000 0.000000 2.896232 0.0000000 0.000000e+00
## GPL 0.00000 0.0000000 0.0000000 0.000000 0.000000 0.7097707 0.000000e+00
## TSN 0.00000 0.0000000 0.0000000 0.000000 0.000000 0.0000000 7.545339e-06
## GB 0.00000 0.0000000 0.0000000 0.000000 0.000000 0.0000000 0.000000e+00
## BJR 0.00000 0.0000000 0.0000000 0.000000 0.000000 0.0000000 0.000000e+00
## BB 0.00000 0.0000000 0.0000000 0.000000 0.000000 0.0000000 0.000000e+00
## TL 0.00000 0.0000000 0.0000000 0.000000 0.000000 0.0000000 0.000000e+00
## GB BJR BB TL
## Gada 0.00000 0.0000000 0.000000 0.000000
## KG 0.00000 0.0000000 0.000000 0.000000
## KH 0.00000 0.0000000 0.000000 0.000000
## GM 0.00000 0.0000000 0.000000 0.000000
## APB 0.00000 0.0000000 0.000000 0.000000
## GPL 0.00000 0.0000000 0.000000 0.000000
## TSN 0.00000 0.0000000 0.000000 0.000000
## GB 16.67316 0.0000000 0.000000 0.000000
## BJR 0.00000 0.1481477 0.000000 0.000000
## BB 0.00000 0.0000000 8.206602 0.000000
## TL 0.00000 0.0000000 0.000000 2.319767
## [,,5]
## Gada KG KH GM APB GPL TSN
## Gada 22.86682 0.000000 0.000000 0.000000000 0.000000 0.000000 0.000000e+00
## KG 0.00000 1.030571 0.000000 0.000000000 0.000000 0.000000 0.000000e+00
## KH 0.00000 0.000000 1.364778 0.000000000 0.000000 0.000000 0.000000e+00
## GM 0.00000 0.000000 0.000000 0.009841379 0.000000 0.000000 0.000000e+00
## APB 0.00000 0.000000 0.000000 0.000000000 4.902462 0.000000 0.000000e+00
## GPL 0.00000 0.000000 0.000000 0.000000000 0.000000 1.201431 0.000000e+00
## TSN 0.00000 0.000000 0.000000 0.000000000 0.000000 0.000000 1.277202e-05
## GB 0.00000 0.000000 0.000000 0.000000000 0.000000 0.000000 0.000000e+00
## BJR 0.00000 0.000000 0.000000 0.000000000 0.000000 0.000000 0.000000e+00
## BB 0.00000 0.000000 0.000000 0.000000000 0.000000 0.000000 0.000000e+00
## TL 0.00000 0.000000 0.000000 0.000000000 0.000000 0.000000 0.000000e+00
## GB BJR BB TL
## Gada 0.00000 0.0000000 0.00000 0.000000
## KG 0.00000 0.0000000 0.00000 0.000000
## KH 0.00000 0.0000000 0.00000 0.000000
## GM 0.00000 0.0000000 0.00000 0.000000
## APB 0.00000 0.0000000 0.00000 0.000000
## GPL 0.00000 0.0000000 0.00000 0.000000
## TSN 0.00000 0.0000000 0.00000 0.000000
## GB 28.22271 0.0000000 0.00000 0.000000
## BJR 0.00000 0.2507701 0.00000 0.000000
## BB 0.00000 0.0000000 13.89134 0.000000
## TL 0.00000 0.0000000 0.00000 3.926679
library(dplyr)
# Menghitung frekuensi observasi di setiap kluster
cluster_frequencies <- table(mod1b$classification)
# Mengurutkan kluster berdasarkan frekuensinya
sorted_clusters <- names(sort(cluster_frequencies, decreasing = TRUE))
# Membuat urutan kluster yang diinginkan (1, 2, 3, 4, 5)
new_order <- 1:length(sorted_clusters)
# Menukar isi kluster dengan urutan yang dihasilkan
mod1b$classification <- recode(mod1b$classification, !!!setNames(as.character(new_order), sorted_clusters))
table(mod1b$classification)
##
## 1 2 3 4 5
## 15 7 6 4 2
3.1.2 Plot Cluster
library(factoextra)
fviz_cluster(mod1b, data = dtx, repel = TRUE, labelsize =8)
3.1.3 Profil
data.clust1 <- cbind(dtx, Cluster = mod1b[["classification"]])
# Calculate the mean of each variable for each cluster
cluster_profiles1 <- aggregate(. ~ Cluster, data.clust1, mean)
# Print the cluster profiles
print(cluster_profiles1)
## Cluster Gada KG KH GM APB GPL TSN
## 1 1 52.49867 4.834000 3.049333 0.1593333 10.048667 5.673333 0.006
## 2 2 55.09857 1.848571 3.228571 0.0000000 1.847143 1.141429 0.000
## 3 3 80.90833 1.076667 0.665000 0.0000000 2.468333 1.230000 0.000
## 4 4 63.16000 1.700000 0.942500 0.0750000 7.267500 3.170000 0.000
## 5 5 58.60000 1.955000 0.775000 1.1050000 6.800000 1.145000 0.060
## GB BJR BB TL
## 1 17.4553333 1.9920000 20.964667 9.450667
## 2 0.9971429 0.6728571 39.882857 4.344286
## 3 4.4183333 0.5900000 9.368333 3.226667
## 4 10.0625000 1.3050000 14.002500 12.160000
## 5 22.2750000 0.9900000 12.890000 6.945000
# Convert the data to long format for plotting
cluster_profiles_long1 <- tidyr::pivot_longer(cluster_profiles1, -Cluster,
names_to = "Variable", values_to = "Value")
# Create the bar plot
ggplot(cluster_profiles_long1, aes(x = Cluster, y = Value, fill = Variable)) +
geom_bar(stat = "identity", position = "dodge") +
labs(x = "Cluster", y = "Mean Value", fill = "Variable") +
theme_minimal() +
ggtitle("Cluster Profiles")
library(ggiraphExtra)
data.akhir1 <- cbind(raw.data2[-35,], Cluster = mod1b[["classification"]]) %>%
relocate(Cluster, .before = 2)
# Radar Plot
ggRadar(
data = data.akhir1,
mapping = aes(colours = Cluster)
) +
theme_light() +
theme(
text = element_text(size = 10), # Mengubah ukuran font global
title = element_text(size = 12), # Mengubah ukuran font judul
axis.text = element_text(size = 10), # Mengubah ukuran font label sumbu
legend.text = element_text(size = 8) # Mengubah ukuran font legenda
)
3.1.4 Map
install_load("spdep","rgdal")
indo <- st_read(dsn= paste0(wd,"/SHP Indonesia/prov.shp"),
quiet = TRUE)
data.map <- cbind(data.clust1, KODE=raw.data3$KODE[-35])
data.indo <- indo %>%
inner_join(data.map, by = c("KODE" = "KODE"))
ggplot() +
geom_sf(data=data.indo, aes(fill=factor(`Cluster`))) +
scale_fill_manual(values=c("1" = "indianred", "2" = "lightgreen", "3" = "dodgerblue",
"4"="cyan3", "5"="purple3"),
name = "Keterangan") +
labs(title = "Cluster Bencana Alam \n pada Provinsi Indonesia 2021",
x = "Longitude",
y = "Latitude") +
theme_minimal() +
theme(legend.text = element_text(size=10),
legend.title = element_text(size=10, face="bold"),
axis.text.x = element_text(size = 10),
axis.text.y = element_text(size = 10),
plot.title = element_text(size=12, face="bold", hjust = 0.5)) +
scale_x_continuous(labels = function(x) paste0(x, "°")) +
scale_y_continuous(labels = function(y) paste0(y, "°"))
3.1.5 Eksport Data Cluster
#Export Data
install_load('openxlsx')
#Model Tentatif
write.xlsx(list("GMM" = data.akhir1),
file = "Data_Cluster.xlsx")
3.2 Jumlah Cluster
#calculate gap statistic for each number of clusters (up to 10 clusters)
gap_stat <- clusGap(dtx, FUN = hcut, nstart = 25, K.max = 10, B = 50)
#produce plot of clusters vs. gap statistic
fviz_gap_stat(gap_stat)
## Koefisien silhoutte dan Elbow
fviz_nbclust(dtx, kmeans, method = "silhouette") #silhouette, k=3
fviz_nbclust(dtx, kmeans, method = "wss") #wss, k=1
fviz_nbclust(x = dtx, FUNcluster = kmeans, method = "gap_stat") #ga_stat k=1
library(NbClust)
library("factoextra")
nb <- NbClust(data = dtx, distance = "euclidean", method="kmeans")
## *** : The Hubert index is a graphical method of determining the number of clusters.
## In the plot of Hubert index, we seek a significant knee that corresponds to a
## significant increase of the value of the measure i.e the significant peak in Hubert
## index second differences plot.
##
## *** : The D index is a graphical method of determining the number of clusters.
## In the plot of D index, we seek a significant knee (the significant peak in Dindex
## second differences plot) that corresponds to a significant increase of the value of
## the measure.
##
## *******************************************************************
## * Among all indices:
## * 5 proposed 2 as the best number of clusters
## * 6 proposed 3 as the best number of clusters
## * 4 proposed 4 as the best number of clusters
## * 2 proposed 5 as the best number of clusters
## * 1 proposed 12 as the best number of clusters
## * 3 proposed 13 as the best number of clusters
## * 1 proposed 14 as the best number of clusters
## * 1 proposed 15 as the best number of clusters
##
## ***** Conclusion *****
##
## * According to the majority rule, the best number of clusters is 3
##
##
## *******************************************************************
fviz_nbclust(nb)
## Error in if (class(best_nc) == "numeric") print(best_nc) else if (class(best_nc) == : the condition has length > 1
3.3 Fuzzy
res.fcm <- fcm(dtx, centers=3)
as.data.frame(res.fcm$u)[1:6,]
## Cluster 1 Cluster 2 Cluster 3
## 1 0.2467255 0.7281924 0.02508206
## 2 0.1461973 0.8098839 0.04391880
## 3 0.3019321 0.2170957 0.48097216
## 4 0.4832460 0.4792199 0.03753414
## 5 0.7372493 0.2379269 0.02482383
## 6 0.1431290 0.8231577 0.03371328
3.3.1 Matriks prototipe awal dan akhir klaster
res.fcm$v0
## Gada KG KH GM APB GPL TSN GB BJR BB TL
## Cluster 1 56.93 0.00 0.37 0 0.00 1.87 0 0.37 0.00 40.82 2.62
## Cluster 2 67.43 0.00 3.56 0 19.59 4.33 0 0.00 0.00 15.01 0.25
## Cluster 3 44.26 3.71 1.55 0 21.40 4.92 0 31.41 5.61 29.51 19.15
res.fcm$v
## Gada KG KH GM APB GPL TSN
## Cluster 1 55.18355 3.134711 3.071610 0.05494674 4.228712 2.255503 0.004726236
## Cluster 2 68.89251 2.482674 1.664947 0.15613349 6.220171 3.243801 0.006948835
## Cluster 3 33.03813 2.531644 1.422857 0.11012486 6.770252 7.358808 0.001777819
## GB BJR BB TL
## Cluster 1 4.890740 1.201827 34.09940 6.124082
## Cluster 2 7.439088 1.083219 13.37960 6.376892
## Cluster 3 51.059097 2.014997 21.96576 13.126262
summary(res.fcm)
## Summary for 'res.fcm'
##
## Number of data objects: 34
##
## Number of clusters: 3
##
## Crisp clustering vector:
## [1] 2 2 3 1 1 2 2 2 2 2 1 1 2 2 2 1 2 2 1 1 1 1 1 1 2 1 2 2 1 3 2 3 2 2
##
## Initial cluster prototypes:
## Gada KG KH GM APB GPL TSN GB BJR BB TL
## Cluster 1 56.93 0.00 0.37 0 0.00 1.87 0 0.37 0.00 40.82 2.62
## Cluster 2 67.43 0.00 3.56 0 19.59 4.33 0 0.00 0.00 15.01 0.25
## Cluster 3 44.26 3.71 1.55 0 21.40 4.92 0 31.41 5.61 29.51 19.15
##
## Final cluster prototypes:
## Gada KG KH GM APB GPL TSN
## Cluster 1 55.18355 3.134711 3.071610 0.05494674 4.228712 2.255503 0.004726236
## Cluster 2 68.89251 2.482674 1.664947 0.15613349 6.220171 3.243801 0.006948835
## Cluster 3 33.03813 2.531644 1.422857 0.11012486 6.770252 7.358808 0.001777819
## GB BJR BB TL
## Cluster 1 4.890740 1.201827 34.09940 6.124082
## Cluster 2 7.439088 1.083219 13.37960 6.376892
## Cluster 3 51.059097 2.014997 21.96576 13.126262
##
## Distance between the final cluster prototypes
## Cluster 1 Cluster 2
## Cluster 2 631.1748
## Cluster 3 2854.4422 3325.6859
##
## Difference between the initial and final cluster prototypes
## Gada KG KH GM APB GPL
## Cluster 1 -1.746453 3.134711 2.7016103 0.05494674 4.228712 0.3855031
## Cluster 2 1.462514 2.482674 -1.8950533 0.15613349 -13.369829 -1.0861992
## Cluster 3 -11.221875 -1.178356 -0.1271435 0.11012486 -14.629748 2.4388080
## TSN GB BJR BB TL
## Cluster 1 0.004726236 4.520740 1.201827 -6.720602 3.504082
## Cluster 2 0.006948835 7.439088 1.083219 -1.630404 6.126892
## Cluster 3 0.001777819 19.649097 -3.595003 -7.544241 -6.023738
##
## Root Mean Squared Deviations (RMSD): 20.37673
## Mean Absolute Deviation (MAD): 482.0302
##
## Membership degrees matrix (top and bottom 5 rows):
## Cluster 1 Cluster 2 Cluster 3
## 1 0.2467255 0.7281924 0.02508206
## 2 0.1461973 0.8098839 0.04391880
## 3 0.3019321 0.2170957 0.48097216
## 4 0.4832460 0.4792199 0.03753414
## 5 0.7372493 0.2379269 0.02482383
## ...
## Cluster 1 Cluster 2 Cluster 3
## 30 0.11375796 0.1036085 0.78263353
## 31 0.15646524 0.8071614 0.03637335
## 32 0.02702304 0.0229784 0.94999856
## 33 0.12893606 0.8220949 0.04896906
## 34 0.22157498 0.6982273 0.08019770
##
## Descriptive statistics for the membership degrees by clusters
## Size Min Q1 Mean Median Q3 Max
## Cluster 1 13 0.3978128 0.4832460 0.7121100 0.8004335 0.8709146 0.9022756
## Cluster 2 18 0.4433379 0.6685871 0.7360856 0.8085227 0.8312259 0.8553647
## Cluster 3 3 0.4809722 0.6318028 0.7378681 0.7826335 0.8663160 0.9499986
##
## Dunn's Fuzziness Coefficients:
## dunn_coeff normalized
## 0.6156167 0.4234250
##
## Within cluster sum of squares by cluster:
## 1 2 3
## 4289.080 4747.395 1882.548
## (between_SS / total_SS = 54.16%)
##
## Available components:
## [1] "u" "v" "v0" "d" "x"
## [6] "cluster" "csize" "sumsqrs" "k" "m"
## [11] "iter" "best.start" "func.val" "comp.time" "inpargs"
## [16] "algorithm" "call"
3.3.2 Run FCM with Multiple Starts
res.fcm <- fcm(dtx, centers=3, nstart=5)
res.fcm$func.val
## [1] 6826.256 6826.256 6826.256 6826.256 6826.256
res.fcm$iter
## [1] 151 127 153 135 133
res.fcm$best.start
## [1] 1
summary(res.fcm)
## Summary for 'res.fcm'
##
## Number of data objects: 34
##
## Number of clusters: 3
##
## Crisp clustering vector:
## [1] 2 2 3 1 1 2 2 2 2 2 1 1 2 2 2 1 2 2 1 1 1 1 1 1 2 1 2 2 1 3 2 3 2 2
##
## Initial cluster prototypes:
## Gada KG KH GM APB GPL TSN GB BJR BB TL
## Cluster 1 58.42 1.30 1.36 0 5.82 5.16 0.00 18.97 2.01 15.65 11.85
## Cluster 2 16.15 3.38 1.23 0 4.00 5.38 0.00 76.62 1.08 19.23 22.15
## Cluster 3 58.89 4.96 0.52 0 6.89 2.51 0.06 10.18 2.64 27.00 6.19
##
## Final cluster prototypes:
## Gada KG KH GM APB GPL TSN
## Cluster 1 55.18355 3.134711 3.071610 0.05494674 4.228712 2.255503 0.004726236
## Cluster 2 68.89251 2.482674 1.664947 0.15613349 6.220171 3.243801 0.006948835
## Cluster 3 33.03813 2.531644 1.422857 0.11012486 6.770252 7.358808 0.001777819
## GB BJR BB TL
## Cluster 1 4.890740 1.201827 34.09940 6.124082
## Cluster 2 7.439088 1.083219 13.37960 6.376892
## Cluster 3 51.059097 2.014997 21.96576 13.126262
##
## Distance between the final cluster prototypes
## Cluster 1 Cluster 2
## Cluster 2 631.1748
## Cluster 3 2854.4422 3325.6859
##
## Difference between the initial and final cluster prototypes
## Gada KG KH GM APB GPL
## Cluster 1 -3.236453 1.834711 1.7116103 0.05494674 -1.5912880 -2.904497
## Cluster 2 52.742514 -0.897326 0.4349467 0.15613349 2.2201706 -2.136199
## Cluster 3 -25.851875 -2.428356 0.9028565 0.11012486 -0.1197479 4.848808
## TSN GB BJR BB TL
## Cluster 1 0.004726236 -14.07926 -0.808173316 18.449398 -5.725918
## Cluster 2 0.006948835 -69.18091 0.003219146 -5.850404 -15.773108
## Cluster 3 -0.058222181 40.87910 -0.625002729 -5.034241 6.936262
##
## Root Mean Squared Deviations (RMSD): 60.28987
## Mean Absolute Deviation (MAD): 1054.524
##
## Membership degrees matrix (top and bottom 5 rows):
## Cluster 1 Cluster 2 Cluster 3
## 1 0.2467255 0.7281924 0.02508206
## 2 0.1461973 0.8098839 0.04391880
## 3 0.3019321 0.2170957 0.48097216
## 4 0.4832460 0.4792199 0.03753414
## 5 0.7372493 0.2379269 0.02482383
## ...
## Cluster 1 Cluster 2 Cluster 3
## 30 0.11375796 0.1036085 0.78263353
## 31 0.15646524 0.8071614 0.03637335
## 32 0.02702304 0.0229784 0.94999856
## 33 0.12893606 0.8220949 0.04896906
## 34 0.22157498 0.6982273 0.08019770
##
## Descriptive statistics for the membership degrees by clusters
## Size Min Q1 Mean Median Q3 Max
## Cluster 1 13 0.3978128 0.4832460 0.7121100 0.8004335 0.8709146 0.9022756
## Cluster 2 18 0.4433379 0.6685871 0.7360856 0.8085227 0.8312259 0.8553647
## Cluster 3 3 0.4809722 0.6318028 0.7378681 0.7826335 0.8663160 0.9499986
##
## Dunn's Fuzziness Coefficients:
## dunn_coeff normalized
## 0.6156167 0.4234250
##
## Within cluster sum of squares by cluster:
## 1 2 3
## 4289.080 4747.395 1882.548
## (between_SS / total_SS = 54.16%)
##
## Available components:
## [1] "u" "v" "v0" "d" "x"
## [6] "cluster" "csize" "sumsqrs" "k" "m"
## [11] "iter" "best.start" "func.val" "comp.time" "inpargs"
## [16] "algorithm" "call"
3.3.3 Pairwise Scatter Plots
plotcluster(res.fcm, cp=1, trans=TRUE)
set.seed(12333333)
res.fcm2 <- ppclust2(res.fcm, "kmeans")
# Menghitung frekuensi observasi di setiap kluster
cluster_frequencies <- table(res.fcm2[["cluster"]])
# Mengurutkan kluster berdasarkan frekuensinya
sorted_clusters <- names(sort(cluster_frequencies, decreasing = TRUE))
# Membuat urutan kluster yang diinginkan (1, 2, 3, 4, 5)
new_order <- 1:length(sorted_clusters)
# Menukar isi kluster dengan urutan yang dihasilkan
res.fcm2[["cluster"]] <- recode(res.fcm2[["cluster"]], !!!setNames(as.character(new_order), sorted_clusters))
table(res.fcm2[["cluster"]])
##
## 1 2 3
## 18 13 3
3.3.4 Cluster Plot with fviz_cluster
fviz_cluster(res.fcm2, data = dtx,
ellipse.type = "convex",
palette = "jco",
repel = TRUE)
table(res.fcm2[["cluster"]])
##
## 1 2 3
## 18 13 3
data.akhir <- cbind(raw.data2[-35,], Cluster = res.fcm2[["cluster"]]) %>%
relocate(Cluster, .before = 2)
datatable(data.akhir)
table(data.akhir$Cluster)
##
## 1 2 3
## 18 13 3
3.3.5 Profil Setiap Cluster
data.clust <- cbind(dtx, Cluster = res.fcm2[["cluster"]])
# Calculate the mean of each variable for each cluster
cluster_profiles <- aggregate(. ~ Cluster, data.clust, mean)
# Print the cluster profiles
print(cluster_profiles)
## Cluster Gada KG KH GM APB GPL TSN
## 1 1 68.34444 2.394444 1.768889 0.17833333 6.526111 3.591111 0.008333333
## 2 2 54.04385 3.956154 3.207692 0.12384615 5.751538 2.320000 0.004615385
## 3 3 31.89667 2.696667 1.370000 0.02666667 9.633333 6.880000 0.000000000
## GB BJR BB TL
## 1 8.217778 0.9611111 12.75389 6.284444
## 2 5.773077 1.5130769 33.14000 7.151538
## 3 52.383333 2.7866667 23.75333 15.990000
# Convert the data to long format for plotting
cluster_profiles_long <- tidyr::pivot_longer(cluster_profiles, -Cluster,
names_to = "Variable", values_to = "Value")
# Create the bar plot
ggplot(cluster_profiles_long, aes(x = Cluster, y = Value, fill = Variable)) +
geom_bar(stat = "identity", position = "dodge") +
labs(x = "Cluster", y = "Mean Value", fill = "Variable") +
theme_minimal() +
ggtitle("Cluster Profiles")
3.3.6 Radar Plot
library(ggiraphExtra)
# Radar Plot
ggRadar(
data = data.akhir,
mapping = aes(colours = Cluster),
) +
theme_light() +
theme(
text = element_text(size = 10), # Mengubah ukuran font global
title = element_text(size = 12), # Mengubah ukuran font judul
axis.text = element_text(size = 10), # Mengubah ukuran font label sumbu
legend.text = element_text(size = 8) # Mengubah ukuran font legenda
)
3.3.7 VALIDATION OF THE CLUSTERING RESULTS
res.fcm4 <- ppclust2(res.fcm, "fclust")
# Fuzzy Silhouette Index:
idxsf <- SIL.F(res.fcm4$Xca, res.fcm4$U, alpha=1)
paste("Fuzzy Silhouette Index: ",idxsf)
## [1] "Fuzzy Silhouette Index: 0.552167291736516"
# Partition Entropy:
idxsf <- PE(res.fcm4$U)
paste("Partition Entropy: ",idxsf)
## [1] "Partition Entropy: 0.66950689169937"
# Partition Coefficient:
idxpc <- PC(res.fcm4$U)
paste("Partition Coefficient : ",idxpc)
## [1] "Partition Coefficient : 0.615616663330754"
# Modified Partition Coefficient:
idxmpc <- MPC(res.fcm4$U)
paste("Modified Partition Coefficient :",idxmpc)
## [1] "Modified Partition Coefficient : 0.423424994996131"
3.3.8 gap index
install_load("clusterSim")
cl1<-pam(dtx,4)
cl2<-pam(dtx,5)
clall<-cbind(cl1$clustering,cl2$clustering)
g<-index.Gap(dtx, clall, reference.distribution="unif", B=10,method="pam")
print(g)
## $gap
## [1] 0.9103916
##
## $diffu
## [1] -0.02054693
3.3.9 Davies-Bouldin’s index
cl2 <- pam(dtx, 5)
print(index.DB(dtx, cl2$clustering, centrotypes="centroids"))
## $DB
## [1] 0.9217699
##
## $r
## [1] 1.2068431 1.2068431 1.0388518 0.8475622 0.3087490
##
## $R
## [,1] [,2] [,3] [,4] [,5]
## [1,] Inf 1.2068431 1.03885184 0.84756224 0.19491166
## [2,] 1.2068431 Inf 0.56056288 0.70977887 0.30874896
## [3,] 1.0388518 0.5605629 Inf 0.37658386 0.08746469
## [4,] 0.8475622 0.7097789 0.37658386 Inf 0.08142941
## [5,] 0.1949117 0.3087490 0.08746469 0.08142941 NaN
##
## $d
## 1 2 3 4 5
## 1 0.00000 28.19528 24.04519 27.79893 83.53662
## 2 28.19528 0.00000 47.17071 35.25616 57.47393
## 3 24.04519 47.17071 0.00000 42.42400 99.43591
## 4 27.79893 35.25616 42.42400 0.00000 89.39108
## 5 83.53662 57.47393 99.43591 89.39108 0.00000
##
## $S
## [1] 16.282261 17.745016 8.697131 7.279063 0.000000
##
## $centers
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
## [1,] 60.89062 4.028750 2.794375 0.246875 9.547500 4.231875 0.01 7.655625
## [2,] 46.20000 3.280000 1.228000 0.190000 7.892000 5.422000 0.01 29.964000
## [3,] 80.90833 1.076667 0.665000 0.000000 2.468333 1.230000 0.00 4.418333
## [4,] 53.60167 1.986667 3.596667 0.000000 1.685000 1.310000 0.00 0.780000
## [5,] 16.15000 3.380000 1.230000 0.000000 4.000000 5.380000 0.00 76.620000
## [,9] [,10] [,11]
## [1,] 1.4118750 17.391875 8.657500
## [2,] 2.9000000 25.846000 9.454000
## [3,] 0.5900000 9.368333 3.226667
## [4,] 0.6033333 41.451667 4.460000
## [5,] 1.0800000 19.230000 22.150000
3.3.10 Calinski-Harabasz pseudo F-statistic
c<- pam(dtx,10)
index.G1(dtx, c$clustering)
## [1] 17.93938
3.4 K-Means
df <- scale(dtx)
set.seed(112233)
km <- kmeans(df, 3, nstart = 25)
p <- fviz_cluster(km, data = dtx, repel=TRUE,
ellipse.type = "convex") # save to access $data
# save '$data'
dt <- p$data # this is all you need
# Menghitung frekuensi observasi di setiap kluster
cluster_frequencies <- table(dt$cluster)
# Mengurutkan kluster berdasarkan frekuensinya
sorted_clusters <- names(sort(cluster_frequencies, decreasing = TRUE))
# Membuat urutan kluster yang diinginkan (1, 2, 3, 4, 5)
new_order <- 1:length(sorted_clusters)
# Menukar isi kluster dengan urutan yang dihasilkan
dt$cluster <- recode(dt$cluster, !!!setNames(as.character(new_order), sorted_clusters))
table(dt$cluster)
##
## 2 3 1
## 10 3 21
# calculate the convex hull using chull(), for each cluster
hull_data <- dt %>%
group_by(cluster) %>%
slice(chull(x, y))
# plot: you can now customize this by using ggplot sintax
ggplot(dt, aes(x, y, colour = cluster)) + geom_point() +
geom_polygon(data = hull_data, alpha = 0.2, aes(fill=cluster))
table(dt$cluster)
##
## 2 3 1
## 10 3 21
3.4.1 Penerapan K-means 3 cluster
km.res <- kmeans(dtx, centers = 3)
# Print the clustering results
print(km.res)
## K-means clustering with 3 clusters of sizes 17, 2, 15
##
## Cluster means:
## Gada KG KH GM APB GPL TSN GB
## 1 53.47353 3.800588 2.146471 0.1858824 7.862353 2.616471 0.006470588 10.297059
## 2 25.71500 2.190000 1.280000 0.0400000 3.750000 7.860000 0.000000000 62.870000
## 3 71.19867 2.242000 2.573333 0.1106667 5.332000 3.682667 0.006666667 5.288667
## BJR BB TL
## 1 1.7823529 29.25353 8.856471
## 2 1.3750000 20.87500 14.410000
## 3 0.8186667 12.83933 4.978667
##
## Clustering vector:
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
## 3 3 1 3 1 3 3 3 3 3 1 1 3 1 1 1 3 3 1 1 1 1 1 1 1 1
## 27 28 29 30 31 32 33 34
## 1 3 1 2 3 2 3 3
##
## Within cluster sum of squares by cluster:
## [1] 6830.540 701.771 3280.799
## (between_SS / total_SS = 55.3 %)
##
## Available components:
##
## [1] "cluster" "centers" "totss" "withinss" "tot.withinss"
## [6] "betweenss" "size" "iter" "ifault"
3.4.2 Profilling Kluster
data.clust3 <- cbind(dtx, Cluster = dt$cluster)
# Calculate the mean of each variable for each cluster
cluster_profiles3 <- aggregate(. ~ Cluster, data.clust3, mean)
# Print the cluster profiles
print(cluster_profiles3)
## Cluster Gada KG KH GM APB GPL TSN
## 1 2 46.43500 6.009000 2.858000 0.2370000 11.543000 6.261000 0.000000000
## 2 3 58.69667 2.956667 0.690000 0.7366667 6.830000 1.600000 0.060000000
## 3 1 66.09619 1.602857 2.238095 0.0152381 4.058095 2.287143 0.001428571
## GB BJR BB TL
## 1 23.55900 2.5790000 21.39300 12.651000
## 2 18.24333 1.5400000 17.59333 6.693333
## 3 4.27619 0.7104762 22.14000 5.117619
# Convert the data to long format for plotting
cluster_profiles_long3 <- tidyr::pivot_longer(cluster_profiles3, -Cluster,
names_to = "Variable", values_to = "Value")
# Create the bar plot
ggplot(cluster_profiles_long3, aes(x = Cluster, y = Value, fill = Variable)) +
geom_bar(stat = "identity", position = "dodge") +
labs(x = "Cluster", y = "Mean Value", fill = "Variable") +
theme_minimal() +
ggtitle("Cluster Profiles")
data.akhir3 <- cbind(raw.data2[-35,], Cluster = dt$cluster) %>%
relocate(Cluster, .before = 2)
# Radar Plot
ggRadar(
data = data.akhir3,
mapping = aes(colours = Cluster),
) +
theme_light() +
theme(
text = element_text(size = 10), # Mengubah ukuran font global
title = element_text(size = 12), # Mengubah ukuran font judul
axis.text = element_text(size = 10), # Mengubah ukuran font label sumbu
legend.text = element_text(size = 8) # Mengubah ukuran font legenda
)